9 research outputs found

    Automated Dynamic Resource Provisioning and Monitoring in Virtualized Large-Scale Datacenter

    Get PDF
    Infrastructure as a Service (IaaS) is a pay-as-you go based cloud provision model which on demand outsources the physical servers, guest virtual machine (VM) instances, storage resources, and networking connections. This article reports the design and development of our proposed innovative symbiotic simulation based system to support the automated management of IaaS-based distributed virtualized data enter. To make the ideas work in practice, we have implemented an Open Stack based open source cloud computing platform. A smart benchmarking application "Cloud Rapid Experimentation and Analysis Tool (aka CBTool)" is utilized to mark the resource allocation potential of our test cloud system. The real-time benchmarking metrics of cloud are fed to a distributed multi-agent based intelligence middleware layer. To optimally control the dynamic operation of prototype data enter, we predefine some custom policies for VM provisioning and application performance profiling within a versatile cloud modeling and simulation toolkit "CloudSim". Both tools for our prototypes' implementation can scale up to thousands of VMs, therefore, our devised mechanism is highly scalable and flexibly be interpolated at large-scale level. Autonomic characteristics of agents aid in streamlining symbiosis among the simulation system and IaaS cloud in a closed feedback control loop. The practical worth and applicability of the multiagent-based technology lies in the fact that this technique is inherently scalable hence can efficiently be implemented within the complex cloud computing environment. To demonstrate the efficacy of our approach, we have deployed an intelligible lightweight representative scenario in the context of monitoring and provisioning virtual machines within the test-bed. Experimental results indicate notable improvement in the resource provision profile of virtualized data enter on incorporating our proposed strategy

    Correlated Set Coordination in Fault Tolerant Message Logging Protocols

    Full text link
    Abstract. Based on our current expectation for the exascale systems, composed of hundred of thousands of many-core nodes, the mean time between failures will become small, even under the most optimistic as-sumptions. One of the most scalable checkpoint restart techniques, the message logging approach, is the most challenged when the number of cores per node increases, due to the high overhead of saving the message payload. Fortunately, for two processes on the same node, the failure probability is correlated, meaning that coordinated recovery is free. In this paper, we propose an intermediate approach that uses coordination between correlated processes, but retains the scalability advantage of message logging between independent ones. The algorithm still belongs to the family of event logging protocols, but eliminates the need for costly payload logging between coordinated processes.

    A MULTIPROTOCOL AUTOMATIC

    No full text
    High performance computing platforms such as Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message passing libraries in HPC applications. These two trends raise the need for fault-tolerant MPI. The MPICH-V project focuses on designing, implementing and comparing several automatic fault-tolerant protocols for MPI applications. We present an extensive related work section highlighting the originality of our approach and the proposed protocols. We then present four fault-tolerant protocols implemented in a new generic framework for fault-tolerant protocol comparison, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal messag

    A MULTIPROTOCOL AUTOMATIC

    No full text
    High performance computing platforms such as Clusters, Grid and Desktop Grids are becoming larger and subject to more frequent failures. MPI is one of the most used message passing libraries in HPC applications. These two trends raise the need for fault-tolerant MPI. The MPICH-V project focuses on designing, implementing and comparing several automatic fault-tolerant protocols for MPI applications. We present an extensive related work section highlighting the originality of our approach and the proposed protocols. We then present four fault-tolerant protocols implemented in a new generic framework for fault-tolerant protocol comparison, covering a large spectrum of known approaches from coordinated checkpoint, to uncoordinated checkpoint associated with causal messag

    Improving Message Logging Protocols Scalability through Distributed Event Logging

    No full text
    International audienceMessage logging is an attractive solution to provide fault tolerance for message passing applications because it is more scalable than coordinated checkpointing. Sender-based message logging is a well known optimization that allows to save messages payload in the sender memory and so only the events corresponding to message receptions have to be logged reliably using an event logger. In existing work on message logging, the event logger has always been considered as a centralized process, limiting message logging protocols scalability. In this paper, we propose a distributed event logger. This new event logger takes advantage of multi-cores processors to be executed in parallel with application processes. It makes use of the nodes' volatile memory to save events reliably. We propose a simple gossip-based dissemination protocol to make application processes aware of new stable events. We evaluated our distributed event logger in the Open MPI library with an optimistic and a pessimistic message logging protocol. Experiments show that distributed event logging improves message logging protocols scalability.Les protocoles à enregistrement de message sont une solution attrayante pour assurer la tolérance aux fautes d'applications à échange de messages car ils passent mieux à l'échelle que les protocoles de sauvegarde de points de reprise coordonnés. L'enregistrement de messages fondé sur l'émetteur est une optimisation bien connue qui permet de sauvegarder le contenu des messages dans la mémoire des émetteurs. Ainsi, seul les évènements associés à la réception des messages ont besoin d'être sauvegardés de manière fiable en utilisant un enregistreur d'évènements. Dans les travaux existants, l'enregistreur d'évènements a toujours été considéré comme centralisé, limitant le passage à l'échelle des protocoles à enregistrement de messages. Dans ce papier nous proposons un enregistreur d'évènements distribué. Ce nouvel enregistreur d'évènements profite des processeurs multi-cœurs pour être exécuter en parallèle avec les processus de l'application. Il utilise la mémoire volatile des nœuds pour sauvegarder les évènements de manière fiable. Nous proposons un simple algorithme de diffusion épidémique pour informer les processus applicatifs des nouveaux évènements sauvegardés de manière fiable. Nous avons évalué notre enregistreur d'évènements distribué dans la bibliothèque Open MPI avec un protocole à enregistrement de messages pessimiste et un optimiste. Nos expériences montrent que l'enregistreur d'évènements distribué améliore le passage à l'échelle des protocoles à enregistrement de messages
    corecore